Robust Co-occurrence Quantification for Lexical Distributional Semantics
نویسندگان
چکیده
Previous optimisations of parameters affecting the word-context association measure used in distributional vector space models have focused either on highdimensional vectors with hundreds of thousands of dimensions, or dense vectors with dimensionality of few hundreds; but dimensionality of few thousands is often applied in compositional tasks as it is still computationally feasible and does not require the dimensionality reduction step. We present a systematic study of the interaction of the parameters of the association measure and vector dimensionality, and derive parameter selection heuristics that achieve performance across word similarity and relevance datasets competitive with the results previously reported in the literature achieved by highly dimensional or dense models.
منابع مشابه
Lexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs
The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...
متن کاملLeveraging Preposition Ambiguity to Assess Compositional Distributional Models of Semantics
Complex interactions among the meanings of words are important factors in the function that maps word meanings to phrase meanings. Recently, compositional distributional semantics models (CDSM) have been designed with the goal of emulating these complex interactions; however, experimental results on the effectiveness of CDSM have been difficult to interpret because the current metrics for asses...
متن کاملM ODELS by Tong Wang A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Exploiting Linguistic Knowledge in Lexical and Compositional Semantic Models Tong Wang Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2016 A fundamental principle in distributional semantic models is to use similarity in linguistic environment as a proxy for similarity in meaning. Known as the distributional hypothesis, the principle has been successfully app...
متن کاملModeling Subcategorization through Co-occurrence: a Computational Lexical Resource for Italian Verbs
1. Goals and Methodology The aim of this abstract is to introduce LexIt, a freely available lexical resource to characterize Italian verb argument properties in terms of distributional information automatically extracted from large corpora with state-of-the-art computational linguistics methods. Research on automatic extraction of subcategorization frames from corpora has a long tradition in co...
متن کاملSemantic Clustering of Russian Web Search Results: Possibilities and Problems
The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are d...
متن کامل